Workload placement based upon CRAC unit capacity utilizations

ABSTRACT

In a method of workload placement based upon capacity utilizations of a plurality of CRAC units, the provisioning of the plurality of CRAC units is determined and the zone of influence for each of the plurality of CRAC units is also determined based upon the provisioning of the plurality of CRAC units. It is determined whether a CRAC unit of the plurality of CRAC units is at least one of near failure or has failed. In addition, the workload is shifted from a zone of influence of at least one CRAC unit to a zone of influence of another of the plurality of CRAC units in response to a determination that a CRAC unit is at least one of near failure or has failed.

BACKGROUND

A data center may be defined as a location, for instance, a room thathouses computer systems arranged in a number of racks. A standard rack,for example, an electronics cabinet, is defined as an ElectronicsIndustry Association (EIA) enclosure, 78 in. (2 meters) high, 24 in.(0.61 meter) wide and 30 in. (0.76 meter) deep. These racks areconfigured to house a number of computer systems, about forty (40)systems, with future configurations of racks being designed toaccommodate 200 or more systems. The computer systems typically includea number of printed circuit boards (PCBs), mass storage devices, powersupplies, processors, micro-controllers, and semi-conductor devices thatdissipate relatively significant amounts of heat during their operation.For example, a typical computer system comprising multiplemicroprocessors dissipates approximately 250 W of power. Thus, a rackcontaining forty (40) computer systems of this type dissipatesapproximately 10 KW of power.

In relatively large data centers, a plurality of computer room airconditioning (CRAC) units are variously positioned to provide coolingairflow to the computer systems. In this regard, the CRAC units aretypically positioned to provide cooling airflow to respective ones ofthe computer systems. If one of the CRAC units were to fail in providingsufficient levels of cooling airflow to its associated computer systems,those computer systems will also begin to fail shortly following theCRAC unit failure, assuming those associated computer systems are notreceiving adequate cooling airflow from another CRAC unit. The failuresin those computer systems are likely to cause delays in the performanceof various computing functions or cause the various computing functionsto shutdown completely. The costs associated with the delays orshutdowns may be relatively high if the CRAC unit failure is not fixedin a relatively short period of time.

Thus, it would be desirable to be able to mitigate the losses inproductivity associated with CRAC unit failures.

SUMMARY

A method of workload placement based upon capacity utilizations of aplurality of CRAC units is disclosed herein. In the method, theprovisioning of the plurality of CRAC units is determined and the zoneof influence for each of the plurality of CRAC units is also determinedbased upon the provisioning of the plurality of CRAC units. It isdetermined whether a CRAC unit of the plurality of CRAC units is atleast one of near failure or has failed. In addition, the workload isshifted from a zone of influence of at least one CRAC unit to a zone ofinfluence of another of the plurality of CRAC units in response to adetermination that a CRAC unit is at least one of near failure or hasfailed.

BRIEF DESCRIPTION OF THE DRAWINGS

Features of the present invention will become apparent to those skilledin the art from the following description with reference to the figures,in which:

FIG. 1A shows a simplified perspective view of a section of a datacenter, according to an embodiment of the invention;

FIG. 1B shows a simplified plan view of the data center depicted in FIG.1A;

FIG. 2 is a block diagram of a workload placement system according to anembodiment of the invention;

FIGS. 3A and 3B, collectively, illustrate a flow diagram of anoperational mode of a method for workload placement among servers basedupon CRAC unit capacities, according to an embodiment of the invention;

FIG. 4 shows an operational mode of a method for server power managementto ensure that there is sufficient capacity in the CRAC units in theevent of a CRAC unit failure, according to an embodiment of theinvention;

FIG. 5 illustrates a computer system, which may be employed to performthe various functions of the workload placement system described herein,according to an embodiment of the invention.

DETAILED DESCRIPTION

For simplicity and illustrative purposes, the present invention isdescribed by referring mainly to an exemplary embodiment thereof. In thefollowing description, numerous specific details are set forth in orderto provide a thorough understanding of the present invention. It will beapparent however, to one of ordinary skill in the art, that the presentinvention may be practiced without limitation to these specific details.In other instances, well known methods and structures have not beendescribed in detail so as not to unnecessarily obscure the presentinvention.

As described in greater detail herein below, the loss in productivitydue to a computer room air conditioning (CRAC) unit failure maysubstantially be mitigated through various workload placementtechniques. Broadly speaking, the workload performed by components, forinstance, computer systems, servers, and the like, in a zone ofinfluence of a CRAC unit that has failed or is likely to fail may beshifted to components in the zones of influence of the other CRAC units.If it is determined that the shifting of the workload results in unsafeprovisioning levels, the power states of the components in a given areamay be lowered.

The power management of the components may also be controlled togenerally ensure that there is sufficient capacity in the CRAC units toperform all of the workload in the event of a CRAC unit failure. Asdescribed below, the actual loads on the CRAC units may be compared to athreshold loading level and the power states of the components mayscaled down if the actual loads exceed the threshold loading level.

With reference first to FIG. 1A, there is shown a simplified perspectiveview of a section of a data center 100 which may employ various examplesof the invention. The terms “data center” are generally meant to denotea room or other space where one or more components capable of generatingheat may be situated. In this respect, the terms “data center” are notmeant to limit the invention to any specific type of room where data iscommunicated or processed, nor should it be construed that use of theterms “data center” limits the invention in any respect other than itsdefinition herein above.

It should be readily apparent that the data center 100 depicted in FIG.1A represents a generalized illustration and that other components maybe added or existing components may be removed or modified withoutdeparting from the scope of the invention. For example, the data center100 may include any number of racks and various other components. Inaddition, it should also be understood that heat generating/dissipatingcomponents may be located in the data center 100 without being housed inracks.

The data center 100 is depicted as having a plurality of racks 102-108,for instance, electronics cabinets, aligned in parallel rows. Each ofthe rows of racks 102-108 is shown as containing four racks (a-d)positioned on a raised floor 110. A plurality of wires and communicationlines (not shown) may be located in a space 112 beneath the raised floor110. The space 112 may also function as a plenum for delivery of cooledair from one or more computer room air conditioning (CRAC) units 114 aand 114 b to the racks 102-108. The cooled air may be delivered from thespace 112 to the racks 102-108 through vent tiles 118 located betweensome or all of the racks 102-108. The vent tiles 118 are shown as beinglocated between racks 102 and 104 and 106 and 108.

As previously described, the CRAC units 114 a and 114 b generallyoperate to supply cooled air into the space 112. The cooled aircontained in the space 112 may include cooled air supplied by one ormore CRAC units 114 a and 114 b. Thus, characteristics of the cooledair, such as, temperature, pressure, flow rate, etc., may substantiallybe affected by one or more of the CRAC units 114 a and 114 b. By way ofexample, the cooled air supplied by one CRAC unit 114 a may mix withcooled air supplied by another CRAC unit 114 b. In this regard,characteristics of the cooled air at various areas in the space 112 andthe cooled air supplied to the racks 102-108 may vary, for instance, ifthe temperatures or the volume flow rates of the cooled air supplied bythese CRAC units 114 a and 114 b differ due to mixing of the cooled air.In certain instances, the level of influence of a CRAC unit 114 a and114 b over the racks 102-108 may be higher for those racks 102-108 thatare in closer proximity to the respective CRAC units 114 a and 114 b. Inaddition, the level of influence of a CRAC unit 114 a and 114 b over theracks 102-108 may be lower for those racks 102-108 that are locatedfarther away from the CRAC unit 114 a and 114 b. Those racks 102-108receiving a predetermined level of cooling airflow from a particularCRAC unit 114 a is considered as being within that CRAC unit's 114 azone of influence. Various manners in which the zones of influence ofthe CRAC units 114 a and 114 b are described in greater detail hereinbelow with respect to FIG. 1B.

The vent tiles 118 may comprise manually or remotely adjustable venttiles. In this regard, the vent tiles 118 may be manipulated to vary,for instance, the mass flow rates of cooled air supplied to the racks102-108. In addition, the vent tiles 118 may comprise the dynamicallycontrollable vent tiles disclosed and described in commonly assignedU.S. Pat. No. 6,574,104, the disclosure of which is hereby incorporatedby reference in its entirety. As described in the U.S. Pat. No.6,574,104 patent, the vent tiles 118 are termed “dynamicallycontrollable” because they generally operate to control at least one ofvelocity, volume flow rate and direction of the cooled airflowtherethrough. In addition, specific examples of dynamically controllablevent tiles 118 may be found in U.S. Pat. No. 6,694,759, filed on Jan.27, 2003, which is assigned to the assignee of the present invention andis incorporated by reference herein in its entirety.

The racks 102-108 are generally configured to house a plurality ofcomponents 116 capable of generating/dissipating heat (not shown), forinstance, processors, micro-controllers, high-speed video cards,memories, semi-conductor devices, and the like. The components 116 maybe elements of a plurality of subsystems (not shown), for instance,computers, servers, bladed servers, etc. The subsystems and thecomponents may be operated to perform various electronic functions, forinstance, computing, switching, routing, displaying, and the like. Inthe performance of these electronic functions, the components, andtherefore the subsystems, may generally dissipate relatively largeamounts of heat. Because the racks 102-108 have generally been known toinclude upwards of forty (40) or more subsystems, they may transfersubstantially large amounts of heat to the cooled air flowingtherethrough to maintain the subsystems and the components generallywithin predetermined operating temperature ranges.

The areas between the racks 102 and 104 and between the racks 106 and108 may comprise cool aisles 120. These aisles are considered “coolaisles” because they are configured to receive cooled airflow from thevent tiles 118, as generally indicated by the arrows 122. In addition,the racks 102-108 generally receive cooled air from the cool aisles 120.The aisles between the racks 104 and 106, and on the rear sides of racks102 and 108, are considered hot aisles 124. These aisles are considered“hot aisles” because they are positioned to receive air that has beenheated by the components 116 in the racks 102-108, as indicated by thearrows 126. By substantially separating the cool aisles 120 and the hotaisles 124, for instance, with the racks 102-108, the heated air maysubstantially be prevented from re-circulating with the cooled air priorto delivery into the racks 102-108. In addition, the cooled air may alsosubstantially be prevented from re-circulating with the heated air priorto returning to the CRAC units 114 a and 114 b. However, there may beareas in the data center 100 where re-circulation of the cooled air andthe heated air occurs. By way of example, cooled air may mix with heatedair around the sides or over the tops of one or more of the racks102-108.

The sides of the racks 102-108 that face the cool aisles 120 may beconsidered as the fronts of the racks and the sides of the racks 102-108that face away from the cool aisles 120 may be considered as the rearsof the racks 102-108. For purposes of simplicity and not of limitation,this nomenclature will be relied upon throughout the present disclosureto describe the various sides of the racks 102-108.

According to another example, the racks 102-108 may be positioned withtheir rear sides adjacent to one another (not shown). In thisembodiment, the vent tiles 118 may be provided in each aisle 120 and124. In addition, the racks 102-108 may comprise outlets on top panelsthereof to enable heated air to flow out of the racks 102-108.

As described herein above, the CRAC units 114 a and 114 b generallyoperate to cool received heated air as indicated by the arrows 126. Inaddition, the CRAC units 114 a and 114 b may supply the racks 102-108with airflow that has been cooled, through any reasonably suitable knownmanners and may thus comprise widely available, conventional CRAC units114 a and 114 b. For instance, the CRAC units 114 a and 114 b maycomprise vapor-compression type air conditioning units, chiller type airconditioning units, etc. Examples of suitable CRAC units 114 a and 114 bmay be found in co-pending and commonly assigned U.S. patent applicationSer. No. 10/853,529, filed on May 26, 2004, and entitled “EnergyEfficient CRAC Unit Operation,” the disclosure of which is herebyincorporated by reference in its entirety.

Also shown in FIG. 1A is a resource manager 128 configured to controlvarious operations of the data center 100. The resource manager 128 mayoperate, for instance, to control the vent tiles 118 to thereby vary atleast one of a direction and a volume flow rate of cooled airflowdelivered through the vent tiles 118. The resource manager 128 may alsooperate to vary the power states of the components 116 as described ingreater detail herein below. As also described herein below, theresource manager 128 may operate to vary workload among variouslylocated components 116 in response to failure in a CRAC unit 114 a. Inaddition, the resource manager 128 may vary the workload among variouslylocated components 116 in preparation of, or in response to, a potentialCRAC unit 114 a failure. Although the computing device 128 isillustrated in FIG. 1A as comprising a component separate from thecomponents 116 housed in the racks 102-108, the computing device 128 maycomprise one or more of the components 116 without departing from ascope of the data center 100 disclosed herein.

The data center 100 is illustrated in FIG. 1A as containing four rows ofracks 102-108 and two CRAC units 114 a and 114 b for purposes ofsimplicity and illustration. Thus, the data center 100 should not beconstrued as being limited in any respect to the number of racks 102-108and CRAC units 114 a and 114 b illustrated in FIG. 1A. In addition,although the racks 102-108 have all been illustrated similarly, theracks 102-108 may comprise heterogeneous configurations. For instance,the racks 102-108 may be manufactured by different companies or theracks 102-108 may be designed to house differing types of components116, for example, horizontally mounted servers, bladed servers, etc.

With reference now to FIG. 1B, there is shown a simplified plan view ofthe data center 100 depicted in FIG. 1A. The data center 100 is shown asincluding CRAC units 114 a-114 e positioned at various locationsthroughout the data center 100. The CRAC units 114 c-114 e may generallycomprise the same or similar configurations as the CRAC units 114 a and114 b described herein above. A plurality of vent tiles 118 are alsoillustrated in FIG. 1B and are configured to deliver cooling airflow toracks 102 a-102 n located in respective vicinities of the vent tiles118. The racks 102 a-102 n have been labeled with the reference numerals102 a-102 n for purposes of simplicity and are generally intended torefer to the same racks 102-108 depicted in FIG. 1A. In addition, thereference character “n” represents any integer number greater than 1. Inthis regard, the data center 100 may include any number of racks 102a-102 n and is not limited to the number of racks 102 a-102 nillustrated in FIG. 1B.

As described herein above, the vent tiles 118 and the racks 102 a-102 nare positioned on a raised floor 110, beneath which lies a space 112(FIG. 1A). The space 112 is in fluid communication with the CRAC units114 a-114 e and generally operates, in one respect, as a plenum forsupplying cooling airflow from the CRAC units 114 a-114 e to bedelivered through the vent tiles 118. In most instances, the space 112may comprise a relatively open space that is accessible by coolingairflow supplied by a plurality of the CRAC units 114 a-114 e. In thisregard, the cooling airflow supplied by the CRAC units 114 a-114 e maymix in the space 112. Therefore, the cooling airflow supplied to theracks 102 a-102 n by the vent tiles 118 may have originated from morethan one of the CRAC units 114 a-114 e.

The CRAC units 114 a-114 e thus influence respective areas in the datacenter 100. In addition, each of the CRAC units 114 a-114 e mayinfluence the respective areas to a certain extent. The racks 102 a-102n over which the CRAC units 114 a-114 e have a predetermined level ofinfluence are considered herein as being within the zone of influence130 a-130 e of the respective CRAC units 114 a-114 e. Thus, those racks102 a-102 n that receive cooling airflow from a CRAC unit at a levelthat is below the predetermined level may be considered as being outsideof that CRAC unit's zone of influence. The predetermined level may bedetermined based upon a plurality of factors. The factors may include,for instance, a minimum level of cooling airflow required from a CRACunit 114 a to safely operate the components 116 in the rack 102 a-102 nshould the other CRAC units 114 b-114 e fail. As another example, thepredetermined level may be set to a predetermined percentage level.

FIG. 1B illustrates an example of the respective zones of influence 130a-130 e of the CRAC units 114 a-114 e. The zones of influence 130 a-130e of the respective CRAC units 114 a-114 e may be determined throughvarious thermodynamic modeling techniques. An example of a suitablemodeling technique is described in Patel et al., “Thermal Considerationsin Cooling Large Scale High Compute Density Data Centers”, Itherm 2002,the disclosure of which is incorporated herein by reference in itsentirety. Thus, for instance, through various modeling techniques, thelevels of influence the CRAC units 114 a-114 e have over particularracks 102 a-102 n may be determined and mapped as shown in FIG. 1B.

In addition, or alternatively, the zones of influence 130 a-130 e of theCRAC units 114 a-114 e may be determined through individually testingfor the influences of each of the CRAC units 114 a-114 e over each ofthe racks 102 a-102 n. A commissioning process may also be employed todetermine the influences each of the CRAC units 114 a-114 e have overrespective ones of the racks 102 a-102 n. It should, in any regard, beunderstood that the zones of influence 130 a-130 e depicted in FIG. 1Bare for purposes of illustration and are not intended to limit the datacenter 100 and its components in any respect.

In any regard, as shown in FIG. 1B, some of the racks, for instance, theracks in a first section 132 a may be included in the zone of influence130 a of a single CRAC unit 114 a. Some of the other racks, forinstance, the racks in a second section 132 b may be included in thezones of influence 130 a and 130 b of two CRAC units 114 a and 114 b. Inaddition, some of the racks, for instance, the racks in a third section132 c may be included in the zones of influence 130 a-130 c of threeCRAC units 114 a-114 c. As such, for example, if the CRAC unit 114 awere to fail, the racks in the first section 132 a of the zone ofinfluence 130 a would not receive adequate levels of cooling fluidbecause they are not within any of the other CRAC unit's zone ofinfluence.

As will be described in greater detail herein below, the components 116located within the zone of influence of a failed CRAC unit may bepowered down when certain conditions are met. In addition, thecomponents 116 located within the zone of influence of an operationalCRAC unit may be powered up and may have the workload from the powereddown components 116 shifted thereon. Again, the shifting of the workloadmay occur when certain conditions are met as described in greater detailherein below.

FIG. 2 is a block diagram 200 of a workload placement system 202. Itshould be understood that the following description of the block diagram200 is but one manner of a variety of different manners in which such aworkload placement system 202 may be configured. In addition, it shouldbe understood that the workload placement system 202 may includeadditional components and that some of the components described hereinmay be removed and/or modified without departing from the scope of theinvention. For instance, the workload placement system 202 may includeany number of sensors, servers, CRAC units, etc., as well as othercomponents, which may be implemented in the operations of the workloadplacement system 202.

As shown, the workload placement system 202 includes the resourcemanager 128 depicted in FIGS. 1A and 1B. As described hereinabove, theresource manager 128 is configured to perform various functions in thedata center 100. In this regard, the resource manager 128 may comprise acomputing device, for instance, a computer system, a server, etc. Inaddition, the resource manager 128 may comprise a microprocessor, amicro-controller, an application specific integrated circuit (ASIC), andthe like, configured to perform various processing functions. In onerespect, the resource manager 128 may comprise a controller of anothercomputing device.

The resource manager 128 is illustrated as being connected to a memory204. However, in certain instances, the memory 204 may form part of theresource manager 128 without departing from a scope of the workloadplacement system 202. Generally speaking, the memory 204 may beconfigured to provide storage of software, algorithms, and the like,that provide the functionality of the resource manager 128. By way ofexample, the memory 204 may store an operating system 206, applicationprograms 208, program data 210, and the like. In this regard, the memory204 may be implemented as a combination of volatile and non-volatilememory, such as DRAM, EEPROM, MRAM, flash memory, and the like. Inaddition, or alternatively, the memory 204 may comprise a deviceconfigured to read from and write to a removable media, such as, afloppy disk, a CD-ROM, a DVD-ROM, or other optical or magnetic media.

The memory 204 may also store a CRAC unit capacity module 212, which theresource manager 128 may implement to perform various functions withrespect to information pertaining to the CRAC units 114 a-114 n. Forinstance, the CRAC unit capacity module 212 may be employed to determinethe actual loads on the CRAC units 114 a-114 n based upon various sensedinformation as described herein below. The actual loads may be used todetermine the levels to which the capacities of the CRAC units 114 a-114n are being utilized.

The CRAC unit capacity module 212 may also be implemented to accessinformation pertaining to the rated capacities of the CRAC units 114a-114 n, which may be stored, for instance, in the program data 210. Therated capacities may pertain to the operating limits set forth by theCRAC unit 114 a-114 n manufacturers or limits determined throughtesting. In other instances, the rated capacities may compriseuser-defined limits based upon one or more criteria as described hereinbelow. In any regard, the CRAC unit capacity module 212 may alsocalculate and compare the difference between the actual loads and therated capacities of the CRAC units 114 a-114 n.

According to an example, the resource manager 128 may be configured togenerate a plurality of maps that may be employed to visualize orcharacterize various conditions in the data center 100. For instance,the resource manager 128 may generate maps of the performance capacitiesof the CRAC units 114 a-114 n, maps of the cooling capacities, includingthe zones of influence, of the CRAC units 114 a-114 n, maps of theactual resource utilizations, and maps of the actual cooling loads. Someor all of the maps may be generated through use of a computational fluiddynamics modeling program. Others of the maps may be generated throughuse of the information obtained by the resource manager 128 regarding,for instance, the actual loads and rated capacities of the CRAC units114 a-114 n. As will be described in greater detail herein below, one ormore of these maps may be employed to determine when and where workloadis to be shifted.

A workload placement determination module 214 may also be stored in thememory 204. The workload placement determination module 214 generallyoperates to determine how and where workloads should be placed basedupon the actual load determinations made by the CRAC unit capacitymodule 212. The resource manager 128 may implement the CRAC unitcapacity module 212 and the workload placement determination module 214in controlling at least one or both of the power states and theworkloads on servers 216 a-216 n positioned in various zones ofinfluence 130 a-130 n (FIG. 1B). In one respect, the resource manager128 may vary at least one or both of the power states and the workloadson the servers 216 a-216 n to generally ensure that there is sufficientcapacity in the CRAC units 114 a-114 n to compensate for failures in oneor more of the CRAC units 114 a-114 n. In another respect, the resourcemanager 128 may vary at least one or both of the power states and theworkloads on the servers 216 a-216 n in response to or in preparation ofthe failure of one or more of the CRAC units 114 a-114 n.

In addition, or alternatively, the resource manager 128 may generatemaps configured to forecast how proposed workload shifts will affect,for instance, the loading on the operational CRAC units 114 a-114 n. Theforecast maps may also be generated through use of a computational fluiddynamics modeling program, with the projected conditions being used asinputs in the modeling algorithm. Through use of these forecast maps,the potential impacts on the conditions in the data center 100 with theshifted workload may be determined prior to the actual implementation ofthe workload shifts. In addition, these forecast maps may generallyensure that the load placed after satisfying all of the policiesreceives sufficient levels of cooling airflow.

According to an example, the forecast maps may be generated prior to aCRAC unit 114 a-114 n failure or potential failure. In this regard, forinstance, one or more forecast maps may be generated during a modelingof the data center 100. In this example, a plurality of maps may begenerated depicting various failure mitigation scenarios. These maps maybe relied upon to relatively quickly determine an appropriate failuremitigation scenario in the event of a CRAC unit 114 a-114 n failure orpotential failure.

The use of the reference character “n” for the CRAC units 114 a-114 nand the servers 116 a-116 n is to represent any integer number greaterthan 1. The ellipses “ . . . ” also indicate that a number of CRAC units“n−1” and a number of servers “n−1” are included in the workloadplacement system 202. In this regard, the workload placement system 202may include any number of CRAC units 114 a-114 n and servers 116 a-116 nand is not limited to the number of CRAC units 114 a-114 n and servers116 a-116 n illustrated in FIG. 2.

Instructions from the resource manager 128 may be transmitted over anetwork 218 that operates to couple the various components of theworkload placement system 202. Although not shown, the resource manager128 may be equipped with software and/or hardware to enable the resourcemanager 128 to transmit and receive data over the network 218. Thenetwork 218 generally represents a wired or wireless structure in thedata center 100 for the transmission of data between the variouscomponents of the workload placement system 202. The network 218 maycomprise an existing network infrastructure or it may comprise aseparate network configuration installed for the purpose of workloadplacement by the resource manager 128.

In any respect, the servers 216 a-216 n may be interfaced with thenetwork 218 through respective interfaces (I/F) 220. The interfaces 220may include any reasonably suitable known hardware and/or softwarecapable of enabling data communications between the servers 216 a-216 band other components, including the resource manager 128, over thenetwork 218. Each of the servers 216 a-216 n also include at least oneprocessor 222 configured to perform various operations in mannersgenerally known in the art. According to one example, the resourcemanager 128 may control the performance or p-states of the processor(s)222 to at least one or both of prepare and compensate for CRAC unitfailure. In another example, the resource manager 128 may operate underautomatic control of power consumption (ACPC) to control the powerconsumption of the servers 216 a-216 n.

Manners in which the performance or p-states may be modified and theACPC may be implemented are described in, Bodas, Devas “New ServerPower-Management Technologies Address Power and Cooling Challenges”,Technology@Intel Magazine, 2003. The disclosure contained in thatarticle is incorporated by reference herein in its entirety. Anotherexample of a suitable manner in which the performance or power states ofthe servers 216 a-216 n may be modified is the POWERNOW! technologyavailable from Advanced Micro Devices of Sunnyvale, Calif.

In making determinations of how to control the performance or p-statesof the processors 222, or in shifting the workload among the servers 216a-216 n, the resource manager 128 may rely upon the determinations madethrough implementation of the CRAC unit capacity module 212 and theworkload placement module 214. These determinations may be based uponinformation received from sensors variously positioned with respect torespective CRAC units 114 a-114 n. More particularly, respective returnair sensors 224 may be positioned to detect the temperatures of thereturn air received by each of the CRAC units 114 a-114 n. Respectivesupply air sensors 226 may be positioned to detect the temperatures ofthe air supplied by each of the CRAC units 114 a-114 n. The return airsensors 224 and the supply air sensors 226 may comprise any reasonablysuitable sensors capable of detecting the temperature of air. Thesensors 224 and 226 may thus comprise thermistors, thermometers,thermocouples, etc. The resource manager 128 may consider additionalfactors in selecting which of the servers 204 a-204 n to place theworkload. The additional factors may include, for instance, the termscontained in a Service Level Agreement, security levels of the servers216 a-216 n, processor speeds, etc.

Air flow sensors 228 may also be positioned at respective outputlocations of the CRAC units 114 a-114 n to detect the flow rate of thecooled airflow supplied by each of the CRAC units 114 a-114 n. The airflow sensors 228 may also comprise any reasonably suitable sensorcapable of detecting the flow rate of air supplied by the CRAC units 114a-114 n. In addition, the temperature sensors 224, 226 and/or the airflow sensors 228 may comprise sensors that have been integrallymanufactured with the CRAC units 114 a-114 n or they may comprisesensors that have been positioned with respect to each of the CRAC units114 a-114 n following their installations in the data center 100. In anyrespect, the condition information detected by these sensors 224-228 maybe transmitted to the resource manager 128 through the network 218.

The illustration of the temperature sensors 224, 226 and the air flowsensors 228 as forming part of respective CRAC units 114 a-114 n is todepict the correlation between the respective CRAC units 114 a-114 n andthe sensors 224-228. Thus, the sensors 224-228 should not be construedas necessarily forming part of the CRAC units 114 a-114 n.

In this regard, the CRAC units 114 a-114 n may include respectiveinterfaces 230 that generally enable data transfer between the CRACunits 114 a-114 n and the resource manager 128 over the network 218. Theinterfaces 230 may comprise any reasonably suitable hardware and/orsoftware capable to enabling the data transfer.

The resource manager 128 may also receive information pertaining to therated capacities of the CRAC units 114 a-114 n. The rated capacities maypertain to, for instance, the operational limits of the CRAC units 114a-114 n or the safe and efficient operating capacities as rated by themanufacturers of the CRAC units 114 a-114 n or as determined throughtesting of the CRAC units 114 a-114 n. In addition, or alternatively,this information may be stored in the form of a database in the memory204.

The memory 204 may also include additional information, such as,correlations between the identifications of the CRAC units 114 a-114 nand their rated capacities, the locations of the CRAC units 114 a-114 n,etc. The memory 204 may also store information pertaining to the zonesof influence 130 a-130 e of each of the CRAC units 114 a-114 n. A usermay program some or all of the information contained in the database.Alternatively, this information may be substantially automaticallyprogrammed. For instance, the resource manager 128 or another computingdevice may automatically update the database when CRAC units 114 a-114 nare removed, added, moved, or modified.

Although the workload placement system 202 has been illustrated as asystem contained in a data center 100, various principles described withrespect to the workload placement system 202 may be applied to multipledata centers 100. For instance, some or all of the CRAC units 114 a-114n and the servers 216 a-216 n in the zones of influence of those CRACunits 114 a-114 n may be located in at least one different data centerfrom the resource manager 128. In this example, part or all of thenetwork 218 may comprise the Internet and the resource manager 128,which may be located in a first data center 100, may be configured totransmit and receive data over the Internet to and from CRAC units 114a-114 n and servers 216 a-216 n located in a different data center.

As such, the resource manager 128 may be configured to place theworkload among servers 216 a-216 n located in different data centers,which may also be located in different parts of the world, for instance.An example of an environment in which the resource manager 128 mayoperate to place the workload in servers 216 a-216 n that are indifferent geographic locations from the resource manager 128 may befound in commonly assigned and co-pending U.S. patent application Ser.No. 10/820,786, filed on Apr. 9, 2004, and entitled “Workload PlacementAmong Data Centers Based Upon Thermal Efficiency”, the disclosure ofwhich is hereby incorporated by reference in its entirety. This exampleprovides greater flexibility in ensuring that the workload is performedin response to one or more CRAC units failing.

FIGS. 3A and 3B, collectively, illustrate a flow diagram of anoperational mode 300 of a method for workload placement among serversbased upon CRAC unit capacity utilizations. It is to be understood thatthe following description of the operational mode 300 is but one mannerof a variety of different manners in which an embodiment of theinvention may be practiced. It should also be apparent to those ofordinary skill in the art that the operational mode 300 represents ageneralized illustration and that other steps may be added or existingsteps may be removed, modified or rearranged without departing from ascope of the operational mode 300.

The description of the operational mode 300 is made with reference tothe block diagram 200 illustrated in FIG. 2, and thus makes reference tothe elements cited therein. It should, however, be understood that theoperational mode 300 is not limited to the elements set forth in theblock diagram 200. Instead, it should be understood that the operationalmode 300 may be practiced by a workload placement system having adifferent configuration than that set forth in the block diagram 200.

The operational mode 300 may be implemented to track and compensate foractual or potential CRAC unit failures. Thus, for instance, if a CRACunit were to fail or were determined to be close to failure, theworkload being performed or scheduled to be performed by the servers inthe zone of influence of that CRAC unit may be shifted to other serversin other zones of influence, provided, for instance, that there issufficient capacity in the CRAC units serving those other zones ofinfluence. In one regard, the reduction in workflow due to the failureof a CRAC unit may thus be minimized through implementation of theoperation mode 300.

The operational mode 300 may be initiated at step 302 in response to anyof a number of stimuli or conditions. For instance, the operational mode300 may be initiated with activation of the components in the datacenter 100, such as, the CRAC units 114 a-114 n and/or the servers 216a-216 n. In addition, or alternatively, the operational mode 300 may bemanually initiated or the resource manager 128 may be programmed toinitiate the operational mode 300 at various times, for a set durationof time, substantially continuously, etc.

Once initiated, the resource manager 128 may determine the provisioningof the CRAC units 114 a-114 n at step 304. The provisioning of the CRACunits 114 a-114 n may be determined through conventional modeling ormetrology techniques using a determined heat load and/or a projecteddistribution of heat load. In addition, the zones of influence 130 a-130e (FIG. 1B) may be determined at step 304.

At step 306, the resource manager 128 may receive the returntemperatures of air received and the temperatures of air supplied byeach of the CRAC units 114 a-114 n. In addition, the resource manager128 may receive flow rate information of airflow supplied by each of theCRAC units 114 a-114 n at step 308. As described herein above, thetemperatures may be detected by respective sensors 224, 226 and thesupply airflow rates may be detected by respective sensors 228. Inaddition, the condition data detected by these sensors 224-228 may becommunicated to the resource manager 128 over the network 218.

Based upon the detected condition information received from the sensors224-228, the resource manager 128 may calculate the actual loads on theCRAC units 114 a-114 n, at step 310. The resource manager 128 maycalculate the actual loads (O) on the CRAC units 1114 a-114 n throughthe following equation:Q={dot over (m)}C _(p)(T _(in) −T _(out)).  Equation (1):

According to a first example, in Equation (1), {dot over (m)} is themass flow rate of airflow supplied by a CRAC unit 114 a-114 n and may bein (kg/s), C_(p) is the specific heat of air, which may be in (J/kg-C),T_(in) is the temperature of airflow received into the CRAC unit 114a-114 n, which may be in (° C.), and T_(out) is the temperature of theairflow supplied by the CRAC unit 114 a-114 n, which may be in (° C.).The actual load (Q) on each of the CRAC units 114 a-114 n may be in(kW).

According to a second example, the actual loads (Q) may be determined ina water-chiller type CRAC unit through detection of the characteristicsof the chilled water flow. In this example, and with reference toEquation (1), {dot over (m)} is the mass flow rate of water flow throughthe chiller (kg/s), C_(p) is the specific heat of water (J/kg-C), T_(in)is the temperature of the water at an inlet of the chiller (° C.), andT_(out) is the temperature of the water at an outlet of the chiller (°C.).

Either of the examples above may be employed to calculate the actualloads on each of the CRAC units 114 a-114 n without departing from ascope of the operational mode 300.

The rated capacities of the CRAC units 114 a-114 n may optionally bedetermined at step 312. Step 312 may be considered as being optionalbecause information pertaining to the rated capacities of the CRAC units114 a-114 n may have previously been determined and stored in the memory204. As described above, the rated capacities of the CRAC units 114a-114 n may be set by the CRAC unit manufacturers or these capacitiesmay be determined through testing and may vary from one CRAC unit 114 ato another CRAC unit 114 b, for instance. In addition, or alternatively,the rated capacities of the CRAC units 114 a-114 n may be selected basedupon one or both of the most energy efficient and least sound-producingoperation. Thus, the ratings for the capacities may be selected basedupon any number of criteria, for instance, a CRAC unit 114 a with amaximum capacity of 100 kW may be most efficient at 85 kW, which may bedetermined, for instance, through hydronics and compressor analysisduring commissioning of the data center 100.

At step 314, the resource manager 128 may calculate the differencebetween the actual loads (AL) on the CRAC units 114 a-114 n and therated capacities (RC) of the CRAC units 114 a-114 n. More particularly,for instance, the resource manager 128 may determine whether the sum ofthe actual loads (AL) for all of the CRAC units 114 a-114 n falls belowthe sum of the rated capacities (RC) for all of the CRAC units 114 a-114n, as indicated at step 316. If the resource manager 128 determines thatthe actual loads exceed or equal the rated capacities, the resourcemanager 128 may output an indication that the heat load cannot beshifted as indicated at step 318. In other words, the resource manager128 may determine and may indicate that there is insufficient coolingcapacity in the CRAC units 114 a-114 n to compensate for the failures ofone or more CRAC units 114 a-114 n.

In a first example, the resource manager 128 may continue with theoperational mode 300 by continuing to perform steps 306-316. Inaddition, in this example, if a CRAC unit failure occurs, the workloadmay not be shifted from the zone of influence 130 a-130 e of the CRACunit that failed since the other CRAC units 114 a-114 n may haveinsufficient capacity to accept the additional load.

In a second example, the resource manager 128 may perform some or all ofthe steps outlined in the operational mode 400 (FIG. 4). As described ingreater detail herein below, the operational mode 400 may be performedto provide sufficient capacity in the CRAC units 114 a-114 n to enablethe workload performed on servers in a zone of influence 130 a-130 n ofa failed CRAC unit to be shifted to servers in other zones of influence.

At step 316, if it is determined that the sum of the actual loads (AL)of the CRAC units 114 a-114 n falls below the sum of the ratedcapacities (RC), the resource manager 128 determine whether a CRAC unitfailure has occurred or is potentially likely to occur, as indicated atstep 320. The resource manager 128 may determine that a CRAC unit ispotentially likely to fail, for instance, if the resource manager 128detects that the CRAC unit is not operating within certain parameters.If all of the CRAC units 114 a-114 n are operating within normalguidelines, the resource manager 128 may continue to perform steps306-320 until a CRAC unit failure is detected at step 320 or until theoperational mode 300 is discontinued. The resource manager 128 maydetermine that a CRAC unit has failed or is likely to fail, forinstance, if the resource manger 128 ceases to receive detectedcondition information from that CRAC unit, if the measurements of thetemperature sensors 224, 226 and/or the air flow rate sensor 228 of thatCRAC unit indicate that there is a problem, etc.

If, at step 320, a CRAC unit failure or potential failure is detected,the resource manager 128 may determine the location of that CRAC unit atstep 322. The resource manager 128 may determine the location of thatCRAC unit based upon, for instance, one or more locations in the datacenter 100 where the temperature is outside of a predetermined range,direct detection of a CRAC unit failure, etc. In any regard, at step324, the resource manager 128 may calculate the ability in the datacenter 100 to redistribute the heat load. More particularly, theresource manager 128 may subtract the sum of the rated capacities (RC)from the actual loads (AL) on the CRAC units 114 a-114 n to accommodatefor the failure of a CRAC unit, for instance, CRAC unit 114 a.

By way of example, if a data center contains four CRAC units rated at100 kW each, the sum of their rated capacities would total 400 kW. Inaddition, if the sum of the actual capacity utilizations of the fourCRAC units totaled 250 kW, there would be a buffer of 150 kW. Thus, if asingle CRAC unit were to fail, the buffer in the available capacitywould be 150 kW.

The resource manager 128 may determine whether the remaining CRAC units114 b-114 n have sufficient capacity to receive the workload from theservers 216 a-216 n contained in the zone 130 a, and more particularlyfrom the servers 216 a-216 n contained in the section 132 a, at step326. The resource manager 128 may determine that the remaining CRACunits 114 b-114 n have sufficient capacity if there is a buffer in theavailable capacity as in the example above. If the resource manager 128determines that there is insufficient capacity, the resource manager 128may output an indication that the heat load cannot be shifted asindicated at step 318.

If, however, the resource manager 128 determines that there issufficient capacity, the resource manager 128 may calculate theavailable capacities in each of the zones 130 b-130 e of the remainingCRAC units 114 b-114 n at step 328. Step 328 may include thedetermination of the types of servers 216 a-216 n contained in each ofthe zones 130 a-130 e to determine, initially, if there are availableservers 216 a-216 n capable of performing the workload contained in theservers 216 a-216 n in the zone 130 a. According to an example, theavailabilities of the servers 216 a-216 n to perform the workload may bedetermined based upon the available states of the servers 216 a-216 n asdisclosed in commonly assigned and co-pending U.S. patent applicationSer. No. 10/929,448, entitled “Workload Placement Based On ThermalConsiderations,” filed on Aug. 31, 2004, the disclosure of which ishereby incorporated by reference in its entirety. In addition, step 328may include the determination of whether those servers 216 a-216 ncapable of performing the workload also meet the requirements of anyservice level agreements, security agreements, and the like.

Assuming that there are servers 216 a-216 n available to perform theshifted workload, the actual available capacities of the CRAC units 1114b-114 n in each of the remaining zones may be calculated by subtractingthe respective actual loads from the rated capacities of the CRAC units114 b-114 n. In addition, the resource manager 128 may identify which ofthe CRAC units 114 b-114 n are being least utilized as those CRAC units114 b-114 n that have the highest available capacities. The resourcemanager 128 may shift the workload from the servers 216 a-216 n in thezone 130 a to the servers 216 a-216 n in the other zones 130 b-130 ebased upon the available capacities of the CRAC units 114 b-114 nassociated with those zones 130 b-130 e, at step 330. More particularly,for instance, the resource manager 128 may first shift the workload tothose servers 216 a-216 n in zones 130 b-130 e having CRAC units 114b-114 n with the highest available capacities and so forth.

At step 332, the resource manager 128 may calculate the actual loads onthe CRAC units 114 b-114 n to which the workload has been shifted. Theactual loads may be calculated, for instance, in any of the mannersdescribed above with respect to step 310. At step 334, the resourcemanager 128 may determine whether the rated capacities of the CRAC units114 b-114 n have been reached by comparing the actual loads calculatedat step 332 with the rated capacities of the CRAC units 114 b-114 n. Ifit is determined that the rated capacities have not been reached, theresource manager 128 may determine whether there is additional workloadto be shifted at step 336. If there is additional workload, steps330-336 may be repeated. Otherwise, if there is no additional workloadto be shifted, the operational mode 300 may be repeated beginning atstep 304.

If, at step 334, it is determined that the rated capacities of the CRACunits 114 b-114 n have been reached, the resource manager 128 maydetermine the CRAC unit 114 b-114 n provisioning levels at step 338. Theprovisioning levels of the CRAC units 114 b-114 n may be determined inany of the manners described above with respect to step 304.

At step 340, the resource manager 128 may determine whether theprovisioning levels of the CRAC units 114 b-114 n are relatively safe.The provisioning level of a CRAC unit 114 b-114 n may be consideredunsafe if the CRAC unit 114 b-114 n is operating at a capacity higherthan the rated capacity for that CRAC unit. The provisioning level of aCRAC unit 114 b-114 n may also be considered unsafe if the CRAC unit 114b-114 n is operating at or above a predefined threshold percentage ofthe rated capacity for that CRAC unit. If it is determined that the CRACunits 114 b-114 n are operating at relatively safe levels, theoperational mode 300 may be repeated beginning at step 306. If, however,it is determined that one or more of the CRAC units 114 b-114 n areoperating at unsafe levels, the resource manager 128 may scale down thepower in all of the servers 216 a-216 n in the zone 130 b-130 eassociated with the one or more CRAC units 114 b-114 n operating at theunsafe levels at step 342. The power in the servers 216 a-216 n may bescaled down by lowering the performance or the p-states of processor(s)222 contained in the servers 216 a-216 n, in any reasonably suitableconventional manner. In addition, the power may be scaled down to safeoperating levels for the CRAC units 114 a-114 n.

Following the scale down of power as indicated at step 342, theoperational mode 300 may be repeated beginning at step 304. In addition,the operational mode 300 may be continued for a predetermined length oftime, for a predetermined number of iterations, until the operationalmode 300 is manually discontinued, etc.

According to another example, the data collected and calculated at someor all of steps 304-314 may be employed to generate a plurality of maps.As described above, the resource manager 128 may generate a plurality ofdifferent types of maps which may be used in determining when and wherethe workload is to be shifted. For instance, in the event that a CRACunit 114 a has failed or is close to failing, the resource manager 128may generate one or more projected deployment maps to generally forecasthow shifting the workload may affect the loading on the CRAC units 114a-114 n. The resource manager 128 may compare the projected deploymentmaps to determine which of the deployment projections provides the mostbeneficial results. In addition, the resource manager 128 may shift theworkload based upon the selected map at step 330.

Although the deployment projection maps have been described as beinggenerated following a determination of a CRAC unit 114 a-114 n failureor potential failure, the deployment projection maps may be generated atany time prior to that determination. For instance, a plurality ofdeployment projection maps configured to depict potential failuremitigation scenarios may be generated at any time prior to thedetermination of a CRAC unit 114 a-114 n failure or potential failure.In any regard, these maps may be relied upon by the resource manager 128to relatively quickly determine an appropriate failure mitigationscenario in the event of a CRAC unit 114 a-114 n failure or potentialfailure.

With reference now to FIG. 4, there is shown an operational mode 400 ofa method for server 216 a-216 n power management to ensure that there issufficient capacity in the CRAC units in the event of a CRAC unitfailure. It is to be understood that the following description of theoperational mode 400 is but one manner of a variety of different mannersin which an embodiment of the invention may be practiced. It should alsobe apparent to those of ordinary skill in the art that the operationalmode 400 represents a generalized illustration and that other steps maybe added or existing steps may be removed, modified or rearrangedwithout departing from the scope of the operational mode 400.

The description of the operational mode 400 is made with reference tothe block diagram 200 illustrated in FIG. 2, and thus makes reference tothe elements cited therein. It should, however, be understood that theoperational mode 400 is not limited to the elements set forth in theblock diagram 200. Instead, it should be understood that the operationalmode 400 may be practiced by a workload placement system having adifferent configuration than that set forth in the block diagram 200.

The operational mode 400 may be implemented to scale down power in eachzone of influence 130 a-130 n to ensure that there is sufficientcapacity in the CRAC units 114 a-114 n should at least one of the CRACunits 114 a-114 n fail. Thus, for instance, should a CRAC unit 114 afail, there may be sufficient capacity in the remaining CRAC units 114b-114 n to perform the workload of the servers 216 a-216 n contained inthe zone 130 a of the CRAC unit 114 a. In one respect, throughimplementation of the operational mode 400, the potential reduction inworkflow caused by a CRAC unit 114 a failure may substantially beminimized.

As shown, the operational mode 400 includes all of the steps 302-314disclosed and described herein above with respect to the operationalmode 300. Therefore, a detailed description of steps 302-314 will beomitted and the description with respect to the operational mode 300will be relied upon as providing a sufficient description of thesesteps.

Therefore, beginning at step 402, the resource manager 128 may determinewhether the sum of the actual capacity utilizations (AC) of the CRACunits 114 a-114 n exceeds or equals a value X. The value X may bedefined as a predefined percentage of the sum of the rated capacities.In addition, the value X may also comprise a user-defined value and maybe based upon a variety of factors. By way of example, the value X maybe determined based upon levels of loading on individual CRAC units 114a-114 n. In this example, the value X may be weighted, for instance,according to the proximity of the CRAC units 114 a-114 n to their ratedmaximum capacities. Thus, if a CRAC unit 114 a is operating close to itsrated maximum capacity, this fact may be given relatively greater weightin determining the value of X. As another example, the value X may bedetermined based upon factors including, redundancy in cooling, responsetime of the CRAC units 114 a-114 n, air delivery infrastructure in thedata center 100, etc.

If it is determined that the sum of the actual capacity utilizations ofthe CRAC units 114 a-114 n falls below the sum of their ratedcapacities, the operational mode 400 may be repeated beginning at step306. If, however, it is determined that the sum of the actual capacityutilizations of the CRAC units 114 a-114 n equals or exceeds the sum oftheir rated capacities, the resource manager 128 may scale down power inthe servers 216 a-216 n contained in each zone of influence 130 a-130 e,as indicated at step 404. The power in the servers 216 a-216 n may bescaled down by lowering the performance or the p-states of processor(s)222 contained in the servers 216 a-216 n as described herein above.

Following the scale down of power as indicated at step 404, theoperational mode 400 may be repeated beginning at step 304. In addition,the operational mode 400 may be continued for a predetermined length oftime, for a predetermined number of iterations, until the operationalmode 400 is manually discontinued, etc.

The operations set forth in the operational modes 300 and 400 may becontained as a utility, program, or subprogram, in any desired computeraccessible medium. In addition, the operational modes 300 and 400 may beembodied by a computer program, which can exist in a variety of formsboth active and inactive. For example, it can exist as softwareprogram(s) comprised of program instructions in source code, objectcode, executable code or other formats. Any of the above can be embodiedon a computer readable medium, which include storage devices andsignals, in compressed or uncompressed form.

Exemplary computer readable storage devices include conventionalcomputer system RAM, ROM, EPROM, EEPROM, and magnetic or optical disksor tapes. Exemplary computer readable signals, whether modulated using acarrier or not, are signals that a computer system hosting or runningthe computer program can be configured to access, including signalsdownloaded through the Internet or other networks. Concrete examples ofthe foregoing include distribution of the programs on a CD ROM or viaInternet download. In a sense, the Internet itself, as an abstractentity, is a computer readable medium. The same is true of computernetworks in general. It is therefore to be understood that anyelectronic device capable of executing the above-described functions mayperform those functions enumerated above.

FIG. 5 illustrates a computer system 500, which may be employed toperform the various functions of the resource manager 128 describedhereinabove, according to an embodiment. In this respect, the computersystem 500 may be used as a platform for executing one or more of thefunctions described hereinabove with respect to the resource manager128.

The computer system 500 includes one or more controllers, such as aprocessor 502. The processor 502 may be used to execute some or all ofthe steps described in the operational modes 300 and 400. Commands anddata from the processor 502 are communicated over a communication bus504. The computer system 500 also includes a main memory 506, such as arandom access memory (RAM), where the program code for, for instance,the resource manager 128, may be executed during runtime, and asecondary memory 508. The secondary memory 508 includes, for example,one or more hard disk drives 510 and/or a removable storage drive 512,representing a floppy diskette drive, a magnetic tape drive, a compactdisk drive, etc., where a copy of the program code for the provisioningsystem may be stored.

The removable storage drive 510 reads from and/or writes to a removablestorage unit 514 in a well-known manner. User input and output devicesmay include a keyboard 516, a mouse 518, and a display 520. A displayadaptor 522 may interface with the communication bus 504 and the display520 and may receive display data from the processor 502 and convert thedisplay data into display commands for the display 520. In addition, theprocessor 502 may communicate over a network, for instance, theInternet, LAN, etc., through a network adaptor 524.

It will be apparent to one of ordinary skill in the art that other knownelectronic components may be added or substituted in the computer system500. In addition, the computer system 500 may include a system board orblade used in a rack in a data center, a conventional “white box” serveror computing device, etc. Also, one or more of the components in FIG. 5may be optional (for instance, user input devices, secondary memory,etc.).

What has been described and illustrated herein is a preferred embodimentof the invention along with some of its variations. The terms,descriptions and figures used herein are set forth by way ofillustration only and are not meant as limitations. Those skilled in theart will recognize that many variations are possible within the spiritand scope of the invention, which is intended to be defined by thefollowing claims—and their equivalents—in which all terms are meant intheir broadest reasonable sense unless otherwise indicated.

1. A method of workload placement based upon capacity utilizations of aplurality of CRAC units, said method comprising: determining theprovisioning of the plurality of CRAC units; determining the zone ofinfluence for each of the plurality of CRAC units based upon theprovisioning of the plurality of CRAC units; determining whether a CRACunit of the plurality of CRAC units is at least one of near failure orhas failed; and shifting the workload from a zone of influence of atleast one CRAC unit to a zone of influence of another of the pluralityof CRAC units in response to a determination that a CRAC unit is atleast one of near failure or has failed.
 2. The method according toclaim 1, further comprising: calculating actual loads on the pluralityof CRAC units, calculating the difference between a sum of the actualloads and a sum of the rated capacities of the plurality of CRAC units;and wherein the step of shifting the workload comprises shifting theworkload in response to the sum of the actual loads falling below thesum of the rated capacities.
 3. The method according to claim 1, furthercomprising: determining whether the remaining plurality of CRAC unitshas sufficient capacity to redistribute the workload; and wherein thestep of shifting the workload comprises shifting the workload inresponse to a determination that there is sufficient capacity in theremaining plurality of CRAC units.
 4. The method according to claim 3,further comprising: calculating the actual available capacity in each ofthe remaining plurality of CRAC units in response to a determinationthat there is sufficient capacity; and wherein the step of shifting theworkload comprises shifting the workload to the zones of influence ofthe remaining plurality of CRAC units based on the actual availablecapacities of the remaining plurality of CRAC units.
 5. The methodaccording to claim 4, further comprising: re-calculating actual loads onthe remaining plurality of CRAC units; and determining whether theactual loads on the remaining plurality of CRAC units have reached therated capacities of the remaining plurality of CRAC units.
 6. The methodaccording to claim 5, further comprising: determining whether additionalworkload is to be shifted in response to the actual loads an theremaining plurality of CRAC units falling below the rated capacities ofthe remaining plurality of CRAC units; and shifting the workload to thezones of influence of the remaining plurality of CRAC units based on theactual available capacities of the remaining plurality of CRAC units inresponse to a determination that additional workload is to be shifted.7. The method according to claim 5, further comprising: determining theprovisioning levels of the remaining plurality of CRAC units in responseto the actual loads on the remaining plurality of CRAC units reachingthe rated capacities of the remaining plurality of CRAC units;determining whether the provisioning levels of the CRAC units is safe;and scaling down power in servers of a zone of influence of a CRAC unitin response to a determination that the provisioning level of the CRACunit is unsafe.
 8. The method according to claim 7, wherein the servershave at least one processor, and wherein the step of scaling down powerin servers comprises scaling down at least one or both of theperformance and p-states of the at least one processor.
 9. The methodaccording to claim 1, further comprising projecting results of one ormore workload shifting scenarios; generating at least one projecteddeployment map based upon projected results; selecting one of the atleast one projected deployment maps; and wherein the step of shiftingthe workload comprises shifting the workload according to the selectedone of the at least one projected deployment map.
 10. The methodaccording to claim 1, further comprising: generating a plurality offailure mitigation maps configured to model a plurality of workloaddistribution scenarios; and wherein the step of shifting the workloadcomprises shifting the workload according to one of the plurality offailure mitigation maps.
 11. The method according to claim 1, wherein atleast one of the plurality of CRAC units is located in a separate datacenter from the remaining plurality of CRAC units, and wherein the stepof shifting the workload comprises shifting the workload to a zone ofinfluence of the at least one of the CRAC units in the separate datacenter.
 12. A method of providing sufficient capacity in a plurality ofCRAC units to compensate for the failure of at least one of theplurality of CRAC units, said method comprising: determining theprovisioning of the plurality of CRAC units; determining the zone ofinfluence for each of the plurality of CRAC units based upon theprovisioning of the plurality of CRAC units; calculating actual loads ofthe plurality of CRAC units; determining whether a sum of the actualloads equals or exceeds a predefined percentage of a sum of the ratedcapacities of the plurality of CRAC units; and scaling down power inservers in each zone of influence in response to the sum of the actualloads equaling or acceding the predefined percentage of the sum of therated capacities, to thereby provide sufficient capacity in theplurality of CRAC units to compensate for the failure of at least one ofthe plurality of CRAC units.
 13. The method according to claim 12,wherein the servers have at least one processor and wherein the step ofscaling down power in servers comprises sealing down at least one orboth of the performance and p-states of the at least one processor. 14.The method according to claim 12, wherein the step of scaling down powerin the servers comprises scaling down power in the servers to a levelcause the sum of the capacity utilizations to fall below the predefinedpercentage at the sum of the rated capacities.
 15. A workload placementsystem comprising: a resource manager; a memory in communication withthe resource manager, said memory comprising a module for calculatingthe actual loads of a plurality of CRAC units, said memory furthercomprising a module for determining placement of workload among aplurality of servers based upon the actual loads of the plurality ofCRAC units; wherein the resource manager is configured to determineavailable capacities in the plurality of CRAC units in response to atleast one of a failure and a potential failure in a CRAC unit of theplurality of CRAC units; and wherein the resource manager is furtherconfigured to shift the workload to at least one server in a zone ofinfluence of at least one of the plurality of CRAC units havingavailable capacity.
 16. The workload placement system according to claim15, wherein the resource manager is further configured to scale downpower in one or more of the servers of a zone of influence of a CRACunit.
 17. The workload placement system according to claim 16, whereinthe servers have at least one processor, and wherein the resourcemanager is configured to scale down at least one or both of theperformance and p-states of the at least one processor.
 18. The workloadplacement system according to claim 15, wherein the resource manager isfurther configured to generate at least one map depicting at least oneprojected deployment scenario, and wherein the resource manager isfurther configured to shift the workload based upon the at least onemap.
 19. The workload placement system according to claim 15, wherein atleast one of the plurality of CRAC units is located in a separate datacenter from the remaining plurality of CRAC units, and wherein theresource manager is further configured to shift the workload to at leastone server in a zone of influence of the at least one of the pluralityof CRAC units located in the separate data center.
 20. A data centercomprising: a plurality of CRAC units,; means for determining theprovisioning of the plurality of CRAC units; means for determining thezone of influence for each of the plurality of CRAC units; means fordetermining whether a CRAC unit of the plurality of CRAC units is atleast one of near failure or has failed; and means for shifting theworkload from a zone of influence of at least one CRAC unit to a zone ofinfluence of another of the plurality of CRAC units based upon whether aCRAC unit has been determined to be at least one of near failure or hasfailed.
 21. The data center according to claim 20, further comprising:means for scaling down at least one or both of the performance andp-states of at least one processor.
 22. The data center according toclaim 20, further comprising: means for generating at least one mapdepicting at least one projected deployment scenario, and wherein themeans for shifting the workload is configured to shift the workloadbased upon the at least one map.
 23. A computer readable storage mediumon which is embedded one or more computer programs, said one or morecomputer programs implementing a method of workload placement based uponcapacity utilizations of a plurality of CRAC units, said one or morecomputer programs comprising a set of instructions for: determining theprovisioning of the plurality of CRAC units; determining the zone ofinfluence for each of the plurality of CRAC units based upon theprovisioning of the plurality of CRAC units; determining whether a CRACunit of the plurality of CRAC units is at least one of near failure orhas failed; and shifting the workload from a zone of influence of atleast one CRAC unit to a zone of influence of another of the pluralityof CRAC units in response to a determination that a CRAC unit is atleast one of near failure or has failed.
 24. The computer readablestorage medium according to claim 23, said one or more computer programsfurther comprising a set of instructions for: calculating actual loadson the plurality of CRAC units, calculating the difference between a sumof the actual loads and a sum of the rated capacities of the pluralityof CRAC units; and wherein the step of shifting the workload comprisesshifting the workload in response to the sum of the actual loads fallingbelow the sum of the rated capacities.
 25. The computer readable storagemedium according to claim 23, said one or more computer programs furthercomprising a set of instructions for: projecting results of one or moreworkload shifting scenarios; generating at least one projecteddeployment map based upon projected results; selecting one of the atleast one projected deployment maps; and wherein the step of shiftingthe workload comprises shifting the workload according to the selectedone of the at least one projected deployment map.